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Abstract 

Regulatory networks have evolved to allow gene expression to rapidly track changes in the environ- 
ment as well as to buffer perturbations and maintain cellular homeostasis in the absence of change (l}[3] . 
Theoretical work and empirical investigation in Escherichia coli have shown that negative autoregula- 
tion confers both rapid response times and reduced intrinsic noise (^|7], which is reflected in the fact 
that almost half of Escherichia coli transcription factors are negatively autoregulated [8 - 10 . However, 



negative autoregulation is exceedingly rare amongst the transcription factors of Saccharomyces cere- 
visiae [9-13 . This difference is all the more surprising because E. coli and S. cerevisiae otherwise have 



remarkably similar profiles of network motifs 13 . In this study we first show that regulatory interac- 
tions amongst the transcription factors of Drosophila melanogaster and humans have a similar dearth of 
negative autoregulation to that seen in S. cerevisiae. We then present a model demonstrating that this 
fundamental difference in the noise reduction strategies used amongst species can be explained by con- 
straints on the evolution of negative autoregulation in diploids. We show that regulatory interactions 
between pairs of homologous genes within the same cell can lead to under-dominance — mutations 
which result in stronger autoregulation, and decrease noise in homozygotes, paradoxically can cause 
increased noise in heterozygotes. This severely limits a diploid's ability to evolve negative autoregula- 
tion as a noise reduction mechanism. Our work offers a simple and general explanation for a previously 
unexplained difference between the regulatory architectures of E. coli and yeast, Drosophila and hu- 
mans. It also demonstrates that the effects of diploidy in gene networks can have counter-intuitive 
consequences that may profoundly influence the course of evolution. 



Author Summary 

All genes have to deal with intrinsic noise, and a variety of mechanisms have evolved to reduce it. One 
important mechanism of noise reduction for transcription factors is negative autoregulation, in which a 
gene product represses its own rate of transcription. Negative auotregulation occurs frequently in E. coli 
but, we find, occurs extremely rarely in S. cerevisiae, D. melanogaster and humans. Whilst there are a 
great many important differences in the genetic architectures of these organisms, they tend to share, with 
the exception of negative autoregulation, similar profiles of network motifs. This makes the discrepancy 
in the degree of negative autoregulation all the more striking, as it lacks any obvious explanation. Our 
study presents a potential explanation, by comparing the evolvability of negative autoregulation as a noise 
reduction mechanism in haploids and diploids. We show that, in diploids, mutations that increase the 
strength of negative autoregulation at one gene copy often increase overall noise in gene expression. This 
results in under-dominance, in which heterozygotes are less fit than homozygotes. The result is that the 
evolution of negative autoregulation in diploids is significantly constrained. We verify our results using a 
combination of detailed molecular simulations and evolutionary simulations 
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Introduction 



Negative autoregulation is a network motif in which a transcription factor inhibits its own expression. 
Theoretical work has shown that this type of regulation reduces intrinsic noise and quickens the response 
time to environmental perturbations (4|[6|[7| and experiments using artificial gene regulatory circuits 
in E. coli have confirmed these predictions [6]. Negative autoregulation therefore represents a simple 
yet powerful mechanism to maintain cellular homeostasis in the face of environmental and metabolic 
perturbations and reduce the often substantial fitness costs that noise can incur pj. Different organisms, 
however, vary a great deal in their use of the motif. In E. coli, close to 50% of transcription factors (82 
out of 182) (8}{10}[14| have been shown to negatively autoregulate. In contrast, negative autoregulation is 



almost entirely absent from the transcription factors of S. cerevisiae (3 out of 169) |9-13 



How can we account for this striking discrepancy? In order to answer this, we looked at the extent 
to which negative autoregulation is used in other species. We interrogated systematic datasets on the 
regulatory interactions amongst the transcription factors of D. melanogaster and humans and found a 
similar pattern to that observed in yeast: in D. melanogaster 3 out of 87 15-17 and in humans 5 out of 
301 16-18 transcription factors autoregulate (see SI, Table S1-S3). Currently, there is no obvious way to 
account for this striking discrepancy between these organisms, despite widespread interest in the strategies 
they employ to tackle noise (l}|7]. Here we develop a model, founded in biophysics, for the evolution of 
negative autoregulation in diploid species. We use it to argue that a dearth of autoregulating genes in 
yeast, flies and humans can be explained by constraints on the evolution of negative autoregulation that 
arise due to diploidy. 



Results 

Gene expression under negative autoregulation 

Previous theoretical work on the dynamics of gene expression under negative autoregulation has consid- 
ered single genes and so is implicitly haploid (4||7]. Such models exclude the more complex interactions 
that occur due to cross-regulation between homologous gene copies within a diploid cell (Fig. 1). Here 
we characterise the expression dynamics and regulatory evolution of homologous pairs of negatively au- 
toregulating genes, taking into account the cross-talk between alleles. 



2 




^ Binding site 

(a) 




L , , , - J n=5 

0.2 0.4 0.6 0.8 1.0 
Protein concentration, p/p^^ 

(b) 



Figure 1: Cross-talk in diploid autoregulators (a) Schematic representation of negative autoregula- 
tion when one (left) and two (right) copies of a gene are present in a cell. In the haploid the amount of 
negative autoregulation the gene experiences depends on on its own expression level. In the diploid, two 
gene copies arc present (shown as light gray and dark gray) , and the amount of negative autoregulation 
experienced by each gene depends on the expression level of both genes combined. If the two gene copies 
differ from one another in the strength of their transcription factor binding sites, complex dynamics can 
arise that are not observed in haploids. (b) Illlustration of variation in the repression function, (j){p), with 
protein concentration for different Hill coefficients, n = 1 (sold line), n = 2 (small dashes) and n = 5 
(large dashes). 
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We model negative autoregulation in a diploid using a set of ordinary differential equations that track 
changes in the mRNA and protein concentrations for each of a pair of alleles (labelled with subscripts 
i and j), r^, rj, pi and pj. The total concentration of mRNA and protein in the diploid cell are given 
by the summed output of the two alleles r = r j + rj and p = Pi + Pj- Changes in mRNA and protein 
concentrations for the pair of alleles over time are given by 

^ = k + (j)i{p) - JrTi, 
— = ki + (f>j{p) -JrVj, 

dpi _ , 

— fiKp JpPt, 

^ = rjkp--fppj. (1) 



According to these equations, mRNA is transcribed at a (usually low) constant background rate ki, plus 
a rate (/)(p) due to negative autoregulation, that decreases as the total cellular protein level p = pi + pj 
increases. Protein is produced from mRNA at the rate of translation kp, whilst protein and mRNA 
degrade with rates 7^ and 7^, respectively. 

As in previous work (4|[6j, we model the repression function (j){p) in Eqs. [l]as a Hill function 

4>iip) ^° 



where Ki is the dissociation constant associated with the autoregulating transcription factor binding site. 
Smaller values of K (lower rates of dissociation) indicate a steeper slope and stronger regulation. The 
Hill coefficient n governs the steepness of the function at the inflection point and hence determines how 
step-like regulation will be. In systems where transcription is regulated by a single binding site, 4>{p) has 
a Michaelis-Menten-like form, corresponding to a Hill coefficient of n = 1 (6| [l9^,20j . A single binding site 
is the simplest and most relevant case for evolving negative autoregulating, and it is the one we focus on 
here. For completeness, we analyse the more general case of arbitrary Hill coefficient in the Appendix 
and in the Supporting Information we show that our results also hold for different values of n. 

In the absence of negative autoregulation (i.e., (/>(p) = ko), mRNA is produced at the maximum rate 
of transcription ki + ko. In this case, concentrations of mRNA and protein reach equilibrium values of 
Tmax = and Pmax = fmaxT^- Starting from these values, equilibrium mRNA and protein levels 

decrease with increasing autoregulatory binding strength (decreasing K). The minimum mRNA and 
protein levels are reached when negative autoregulation is strongest (i.e. (j){p) — )• as K — )• 0). The 
resulting minimum equilibrium concentrations are rmin = ^ and Pmin = rmin ^ • 

rr IP 

Evolution of negative autoregulation for homeostasis and faster response times 

In order to analyse the evolution of autoregulatory binding sites we consider two separate but related func- 
tions of negative autoregulation: faster response times and maintaining mRNA and protein homeostasis. 
First, to study the evolution of negative autoregulation for faster response times, we simply equate the 
fitness of a system with its response time (i.e the time taken to return to equilibrium following a pertur- 
bation). We use Eqs. 1 to infer selection pressures on the strength of autoregulation, i.e., the dissociation 
constant K, by analysing how quickly genotypes with different autoregulatory binding strength return 
to equilibrium following a perturbation in protein level. To do this we calculate a genotype's "response 
time": the time taken for cellular protein concentration to return to equilibrium following a perturba- 
tion. We model perturbations as a reduction of the protein level to a fraction a of the equilibrium level. 
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The value of a varies continuously between and 1 to encompass both small perturbations, for example 
those resulting from intrinsic noise in transcription and translation (q ~ 1), and larger perturbations, for 
example those resulting from resource deprivation in the environment or following cell division jlj. We 
present results derived from numerical analysis of Eqs. [ijthat are applicable to perturbations of any size. 
These are complemented with an analytical treatment of the response time of the system to small per- 
turbations, based on its maximal eigenvalue, |A| (see Appendix), which allows us to develop an intuition 
for how autoregulating genes in diploids respond to perturbations. 

To study the evolution of negative autoregulation for homeostasis, we turn to stochastic simulations 
of negatively autoregulating genes, which allow us to assess the amount of intrinsic noise associated with 
gene expression. Previous work has shown that negative autoregulation can help maintain homeostasis in 
gene expression by reducing the amount of intrinsic noise in negatively autoregulating genes, compared 
to other genes [7]. In fact, reducing the response time of a gene to very small perturbations away from 
equilibrium, also decreases the intrinsic noise in gene expression. Therefore, the two functions of negative 
autoregulation we consider (producing faster response times and reduced intrinsic noise) are highly inter- 
related. To study the evolution of negative autoregulation for lower intrinsic noise, we equate the fitness 
of the system with the amount of intrinsic noise it displays (i.e the ratio of the variance in gene expression 
to the mean gene expression level). We infer selection pressures on the strength of autoregulation, i.e., 
the dissociation constant by the intrinsic noise of genotypes with different autoregulatory binding 
strengths. These are determined by performing Monte Carlo simulations for a full, molecular model of 
transcription, translation and autoregulation (see Materials and Methods). 

Response time in homozygotes 

We first compare the response times of two homogozyotes whose alleles are identical in every respect except 
for the dissociation constant. One homozygote carries two copies of a resident allele with dissociation 
constant the other carries two mutant alleles that have a decreased dissociation constant = 
i^iexp[— e] (with e > 0) and hence stronger autoregulatory binding. Numerical analysis of the system 
shows that homozygotes for the more strongly autoregulating allele (with K2) respond more quickly than 
homozygotes for the more weakly autoregulating allele (with Ki, Fig. 2a). This is true up to a value 
of ~ Kopt, which provides the fastest response time attainable by the system and hence provides the 
optimal binding strength. Further increases in regulation beyond this value are not favoured because 
they lead to overshooting the optimal binding strength. These results for diploid homozygotes mirror 
those obtained for haploids j6] (see Appendix) and show that regulatory interactions between pairs of 
identical alleles do not, in themselves, diminish the beneficial effects of negative autoregulation. Negative 
autoregulation can therefore, in principle, function as a mechanism to produce faster response times in 
diploids just as it does in haploids. 
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Figure 2: Invasibility of autoregulatory binding sites The response time of mutant (a) homozygotes 
and (b) hcterozygotes are shown. Different values of the binding strength of the resident ahele, in units 
oipmax/K (x-axis), are plotted against mutations to binding site strength e of different size (y-axis). 
Thus the graphs compare a resident allele, Ki with a mutant allele, = Ki exp[— e]. Mutations falling 
into white region result in decreased response time in the carrier compared to resident genotype and are 
favoured by selection; mutations falling into the gray region result in increased response time and are not 
favoured by selection. Weak binding occurs when Pmax/K > 10° [4,6]. Response times were calculated 
by numerically integrating Eq. 1 from zero protein concentration to 90% of the equilibrium. The optimal 
binding strength in these graphs is Pmax/K = 1250, corresponding to a background transcription rate 

hhp = 10-3. 
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Response time in heterozygotes 

The results above depend on comparing homozygotes for alleles with different dissociation constants, Ki 
and i^2- The evolution of negative autoregulation, however, must occur through the stepwise accumulation 
of new mutations that are initially rare and found only in heterozygotes. In order to assess whether 
autoregulation can evolve in diploids, we therefore need to determine whether a mutant allele with a 
stronger binding site {K2) will confer a selective advantage to a heterozygote that also carries a resident 
allele with a weaker binding site (i^i). A mutation will be favoured and increase in frequency if a 
heterozygote is able to respond more quickly to perturbations than a homozygote carrying two copies of 
the more weakly binding resident allele. 

Numerical analysis of Eqs. 1 reveals that heterozygotes often have greater response times than ho- 
mozygotes with the more weakly binding resident allele. Fig. 2b shows that heterozygotes only have 
improved response times when the resident allele binding strength is weak {K > Pmax (4||6]) or if the 
effect of a mutation that increases binding strength is small (e is small). As the resident allele bind- 
ing strength increases (i.e. Pmax/K increases) an ever larger range of mutation sizes result in increased 
heterozygote response times (Fig. 2b), resulting in under- dominance (i.e. heterozygote disadvantage). 
Typicaly mutation sizes e for transcription factor binding sites are in the range 1 < e < 3, |20| - [22| . In this 
range regulatory mutations are subject to under-dominance and increasingly so as the binding strength 
of the resident allele increases. As a consequence, the maximum binding strength that can evolve is likely 
to be significantly lower than in haploids (Fig. 2). Based on these results, we expect under-dominance to 
pose a significant barrier to the evolution of negative autoregulation in diploids. 

To better understand why under-dominance arises in this system, we calculated the eigenvalues as- 
sociated with Eqs. 1. These provide a measure of the rate at which the system returns to equilibrium 
following a small perturbation, and allow us to elucidate the relative contributions of the different alleles 
to the response dynamics of the gene pair. The maximal eigenvalue of Eqs. 1 for a heterozygote, |A/iet|) 
can be expressed as 

V 

\^het\ = |A/iom| ^ — , (2) 

Phet 

(see Appendix) where V is the squared difference of the mean steady state expression levels of the two al- 
leles in the heterozygote and | Xhom I is the maximal eigenvalue of a homozygote with protein concentration 
equal to that of the heterozygote at equilibrium, p^^j (see Appendix). Eq. 2 says that, even if increasing 
autoregulatory binding strength leads to a faster response time in a homozygote, this advantage is offset 
in the heterozygote by an amount V^/p/^g^, which measures how different the expression levels of the two 
alleles are (it is analogous to the Fano factor, a measure of the spread in a probability distribution [7]). 
As the difference in the expression of the alleles increases, V/p*^^^ increases from to a maximum p*^^^/2. 

We can understand why increasing the difference in allelic expression results in increased response time 
by considering the contribution of the individual alleles to the response time of the gene pair (Fig. 3). 
The level of negative autoregulation at each allele depends on the strength of its binding site and the 
amount of protein product present in the cell. In a heterozygote, the allele with the stronger binding site 
is more strongly suppressed (compared to the same allele in a homozygote), since there is more protein 
available to bind to it. At the same time, the allele with the weaker binding site is less strongly suppressed 
compared to the same allele in a homozygote. As a result, the allele with the stronger binding site has 
a faster response time than in a homozygote, whilst the allele with the weaker binding site has a slower 
response time than in a homozygote. However, the overall effect tends to be to increase the response time 
of the heterozygote, because the dynamics of protein expression in the heterozygote are dominated by the 
allele with the weaker binding site (Fig. 3). 
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Figure 3: Response times and allele expression This figure shows quantitative results for the 
contributions of different alleles to expression and to response time, (a) Expression level of the resident 
allele (black line) and the mutant allele (red line) in the heterozygote relative to the resident allele in 
the homozygote. As binding strength increases the resident allele is over-expressed, (b) Response times 
for individual alleles (time to return to 90% of the equilibrium expression level) in the heterozygote. 
The response time of the resident allele (black line) and the mutant allele (red line) in the heterozygote 
are shown relative to the response time of the resident allele in the homozygote. The resident allele in 
the heterozygote shows an increased response time with increasing binding strength. Mutant alleles in 
these graphs have dissociation constant Erexp[— 2], and the optimal binding strength in these graphs is 
Pmax/K = 1250, corresponding to a background transcription rate h/jp = 10~^. 
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Evolution of faster response times 



Under-dominance for response time occurs across a wide range of parameter values, but can be avoided 
if mutations have small effects on binding site strength (Fig. 2b). To determine whether a series of 
mutations with small effect could offer a feasible way for genes to evolve strong negative autoregulation 
in diploids, we carried out simulations of binding site evolution that incorporated established properties 
of real binding sites. 

Transcription factor binding sites in eukaryotes vary between 5 and ~ 30 nucleotides in length, with an 
average of 10 nucleotides |23j. They have a small number of optimal sequences that bind the transcription 
factor with maximum affinity (20| |22p4p 5] . The binding strength of a site can be expressed as a function of 
the total binding energy E of its sequence, so = exp[— E']. This total binding energy is generated by the 
additive contributions of individual nucleotides to overall binding, E = Yli^i- Individual contributions 
are set to = for nucleotides that do not match the optimal sequence and > for matched 
nucleotides (20}{22]. 

Based on these properties, we performed simulations of the evolution of an autoregulatory binding site 
under selection for decreased response time. These took into account the empirical distribution of binding 
site length in model eukaryotes and the variation in contributions to binding strength across the binding 
site sequence (see Materials and Methods). The values of ej were drawn from a uniform distribution in 



the interval (0,3). This sampling covers the empirically estimated range 1 < ej < 3 [20-22 . It also 



ensures that mutations of small effect (e < 1) occur frequently and so allows for the possibility that 
autoregulation could evolve via the accumulation of mutations with small effect. Evolution was started 
from a state of minimum affinity (all nucleotides non-optimal) and proceeded through a series of single 
nucleotide substitutions. A mutant was assumed to go to fixation if it resulted in a response time less 
than or equal to that of the resident. Simulations were carried out for both haploids and diploids (for 
which the response time of mutants was evaluated in the heterozygote state). 

The results (Fig. 4) confirm that under-dominance strongly constrains the evolution of negative au- 
toregulation in diploids. Haploids readily evolved binding sites with dissociation constants close to Kept- 
In contrast, the average binding strength in diploids was around 100 times weaker than Kopt and only 
a small proportion of sites reached binding strengths comparable to those of haploids. This shows that 
under realistic conditions, diploids will rarely be able to evolve the level of autoregulation observed in 
haploids. 
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Haploid 




Figure 4: Evolution of autoregulatory binding sites Distribution of binding site strength achieved 
in evolutionary simulations for haploids (gray) and diploids (white) . Haploids are able to evolve stronger 
binding than diploids. The histograms shows results of 10^ replicate simulations for each ploidy level. The 

simulation procedure is described in the main text and the Materials and Methods. The optimal binding 
strength used was Pmax/K = 1250, corresponding to a a background transcription rate ki/^p = 10~^. 
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Figure 5: Intrinsic noise in gene expression The figure shows quantitative results for the intrinsic 
noise of autoregulating genes, as measured by the ratio of the variance to mean expression in protein 
concentration at equihbrium. (a) Percentage change in the noise of a heterozygote compared to the 
resident homozygote. These are shown for different Hill coefficients, n = 1 (black), n = 2 (red) and n = 3 
(blue). Mutations become deleterious in the heterozygote when Pmax/K > 1. (b) Percentage change in 
the noise of a mutant homozygote compared to the resident homozygote. Mutations become deleterious in 
the mutant homozygote when Pmax/K is about 10. The graphs show the results of stochastic simulations 
(see Materials and Methods) for parameter values typical for transcription factors [7], kr = O.Ols"^, 
kp = 0.17s^^, ki = O.OOls"-^, 7r. = ^s~^ and 7p = gjl^o^ ""^^ '^^^ resident homozygote has binding 
strength Pmax/ K (as indicated by the x-axis), mutations are of size e = 2. 
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Intrinsic noise in diploids 



In order to investigate the evolution of negative autoregulation as a mecahnism to reduce intrinsic noise in 
diploids, we turned to stochastic simulations. Intrinsic noise in gene expression occurs because transcrip- 
tion and translation are inherently noisy processes. As a result of this noise, all genes experience constant 
fluctuations in their mRNA and protein levels. The greater intrinsic noise associated with a particular 
gene, the higher the variance in its expression level relative to the mean. Therefore, a natural way to 
characterise the amount of intrinsic noise associated with a gene is to measure the ratio of the variance 
to the mean expression level at equilibrium (known as the Fano factor) We performed molecular 
simulations that capture transcription, translation and degredation in the presence of negative autoreg- 
ulation (see Materials and Methods). Just as in our analysis of response times, we compared a resident 
allele with dissociation constant Ki, to a mutant allele with dissociation constant K2 = Ki exp[— e]. We 
compared the intrinsic noise (as measured by the Fano factor) in the resident homozygote to that of the 
heterozygte and the mutant homozygote, and thus determined whether under-dominance occurs in the 
evolution of negative autoregulation as a mechanism to reduce intrinsic noise. The results are shown in 
Fig. 5. We find once again that under-dominance occurs. Whereas the optimal binding strength for a 
single negatively autoregulating binding site is found to be Pmax/K ~ 10, the maximum evolvable binding 
strength (i.e that which can evolve without encountering under-dominance) is found to be Pmax/K ~ 1, 
an order of magnitude weaker. A similar pattern occurs when steeper Hill coefficients are considered 
(Fig. 5). Therefore we conclude that under-dominance poses a barrier to the evolution of strong negative 
autoregulation both as a mechanism to speed response times and to reduce intrinsic noise. 



The effects of mutations to other parameters 

To test the generality of our findings, we also considered variation in other parameters (see SI Fig. S1-S5 
and Text). We first relaxed our assumption of a single binding site and explored the case of Hill coefficients 
n > 1, implying regulation through multiple, cooperatively acting binding sites. In line with the eff'ect of 
increasing binding strength through changes in K, we find that mutations increasing the Hill coefficient 
are subject to under-dominance (see SI Fig. S1-S2 and Text). Therefore, a mutation that increases 
the strength of negative autoregulation are subject to the same evolutionary constraints, independent of 
whether they increase regulation by changing the dissociation constant K or the Hill coefficient n. 

We also considered variation in the rates of mRNA and protein degradation (7,, and 7p) to see whether 
they provide conditions in which the effects of under-dominance on autoregulatory binding strength can 
be avoided (see SI Fig. S4 and Text). Variation in the rate of mRNA or protein degradation did not 
remove the tendency for mutations that increase autoregulatory binding strength to be subject to under- 
dominance. However, it has been pointed out in other work (2||26 , faster rates of protein degradation result 



in faster response times, and regulation of protein degradation can reduce noise. As might be expected, 
the constraints we describe on the evolution of response times through stronger negative autoregulation 
do not preclude the evolution of response times through other mechanisms, such as changes in protein 
degradation rates. 



Discussion 

Negative autoregulation is found to occur in 46% of E. coli transcription factors |4-7], but is rare in 
other species for which systematic data on transcriptional regulation is available, occurring in <2% of 
the transcription factors of yeast, Drosophila and humans (see SI, Table S1-S3). We have argued that, 
at least in part, this difference can be understood by considering the different evolutionary dynamics 
of autoregulating genes in haploids and diploids: selection for genes to have a decreased response time 
to perturbations favours negative autoregulation in haploids, but under-dominance tends to prevent the 
evolution of stronger autoregulatory binding sites for this purpose in diploids. This constraint on the 
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evolution of negative autoregulation in diploids is compelling because it offers a simple and general 
explanation for the near absence of the motif across yeast, humans and flies. Furthermore, it is important 
to note that under-dominance is not built into our model but arises as an emergent property of our 
analysis of regulatory evolution - an analysis that simply extends to diploids previous models that have 
been shown to provide a good description of regulatory behaviour in haploids ^[7] . 

The empirical patterns we present are striking, however it is important to ask weather they can be 
explained by other means than those proposed in this paper. In particular we asked whether negative 
autoregulation is truly under-represented in the yeast, human and Drosophila data sets, as compared to 
E. coli, or whether the apparent reduction in the number of negative autoregulators is due to under- 
representation of genes with repressive function generally. To address this we interrogated each dataset 
to find the number of transcription factors with documented repressor activity. These account for 58 
factors in humans, 37 in Drosophila, 54 in yeast and 82 in E. coli. If we include only transcription 
factors with known repressor function in our analysis, we find that 5 out of 58 (8.6%) genes negatively 
autoregulate in humans, 3 out of 37 (8.1%) in Drosophila, 3 out of 54 (5.6%) in yeast and 82 out of 138 
(59%) in E. coli. Thus, the relative rarity of negative autoregulation in eukaryotes is not due to a general 
underrepresentation of repressive transcription factor effects among the genetic interactions described for 
these species. Instead, they appear to be a true property of their regulatory networks. 

The species for which we were able to collate data show stark differences in the frequency of negative 
autoregulation. However, it would clearly be desirable to extend the scope of the empirical analysis to 
include examples that span the prokaryote-eukaryote divide. Two types of data are of particular interest 
in this respect: haploid genes in diploid species and duplicate genes in haploid species. The constraints 
on the evolution of autoregulation predicted by our model only affect genes for which two homologous 
copies are expressed in the same cell. Haploid genes in a diploid organism should therefore escape the 
evolutionary constraint on negative autoregulation. Unfortunately, the data on genetic regulation are 
currently too sparse to test this prediction with any degree of rigor. The only candidate for a haploid 
gene in our dataset is the human Y- linked transcription factor Sry (see SI Table. S3). However, its mode of 
regulation (positive or negative) is unknown. Duplicate genes in haploids offer a better prospect. Just as 
single copy genes in diploids are predicted to escape under-dominance, multi-copy genes in haploids may 
be subject to evolutionary constraints similar to those we have described for diploids. Interestingly, this 
appears to be supported by data on duplicate genes in E. coli. Although the use of negative autoregulation 
is widespread amongst E. coli transcription factors, duplicates of negative autoregulating genes are no 



more common than among other genes 10 . This is despite the fact that negative autoregulation is 



expected to reduce the deleterious effects of increased dosage following duplication |10j. Although the 
evolutionary dynamics of duplication and divergence are complex 27 , it is interesting to note that our 
model predicts that any divergence in the expression levels of a pair of negatively autoregulating duplicate 
genes will tend to slow the response time of the pair. This is because expression divergence will tend 
to increase the difference in expression across the two genes and thus will increase the response time in 
exactly the same way as we have described for heterozygotes in diploid cells. 

The frequency of autoregulating duplicates in E. coli offers some support for our model by showing 
that, as we would predict, the evolution of multi-copy genes in bacteria is constrained. However, it 
could be argued that the relative dearth of negative autoregulation in yeast, fruit flies and humans is not 
primarily caused by under-dominance, but is rather due to the fact that these organisms are eukaryotes. 
Eukaryotes may simply experience different types of noise, and, accordingly, have different mechanisms for 
dealing with it, making negative autoregulation unnecessary. Although we cannot dismiss this limitation 
of our model entirely, several points are worth noting. The use of response time as a measure of fitness 
makes our model very general. First, any cell has to deal with large perturbations, such as occur following 
cell division. The speed with which the concentration of a transcription factor returns to its equilibrium, 
and the regulatory dynamics allowing it to do so, are important across all levels of biological complexity. 
Second, as we have noted, although our model captures the response time to perturbations and the 
amount of intrinsic noise associated with a gene [T], it does not capture other, extrinsic sources of noise. In 
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particular, eukaryotes tend to be affected by "input noise" that results, for example, from the stochastic 
ON-OFF switching occuring in eukaryotic cells [2 , 28 , and it is positive autoregulation, not negative 
autoregulation, that is best for dealing with such noise [2,28,29 . However, positive autoregulation does 
not feature any more prominently than negative autoregulation within the regulatory networks of the 
three eukaryotes we analysed, with 9 instances in yeast, 16 in humans and 11 in Drosophila. These figures 
are not comparable to the frequency of negative autoregulation in E. coli, indicating that we are not 
simply observing a straightforward shift in the importance of diff'erent types of perturbations. 

It seems unlikely that eukaryotes are completely exempt from dealing with the kind of perturbations 
that would require negative autoregulation. It is possible, however, that they implement autoregulation 
differently, and in a way that would not appear in our systematic screen of existing datasets. When 
using transcription regulation, eukaryotes may be able to achieve some degree of negative autoregulation 
through multiple, weak autoregulatory binding sites, along with cooperation (see Figs. S1-S2). Although 
we are able to show that the evolution of strong cooperative autoregulation is subject to under-dominance 
(Fig. SI), we find that the evolution of multiple, weak autoregulatory binding sites (Fig. S2) are less con- 
strained. Since weak binding sites would likely be under-represented or absent from systematic datasets, 
we cannot rule out the possibility that diploids achieve negative autoregulation in this way, and some 
studies based on sequence conservation do suggest that autoregulatory binding sites in humans may be 
quite widespread [30| . Eukaryotes may also achieve negative autoregulation through mechanisms other 
than direct transcription regulation, for example, through changes in local chromatin structure or covalent 
changes in the protein structure of transcription factors. As these regulatory mechanisms are less likely to 
generate the phenomena of cross-regulation that occur in diploid transcription regulation, they may not 
to be subject to under-dominance. Finally, it is important to reiterate that our study is only concerned 
with the evolution of negative autoregulation for noise reduction and faster response times. However, 
genes can achieve noise reduction through other means than autoregulation, and autoregulation can be 



used for other purposes than noise reduction 31-33 . We do not suggest that eukaryotes are exempt from 



the problem of noise. We do suggest that diploid gene networks, in contrast to those of haploids, must 
seek a different solution to the same problem. 



Conclusion 

Our work shows that regulatory interactions between homologous genes can generate deleterious effects 
that constrain the evolution of negative autoregulation. The predictions of our model show that the high 
incidence of autoregulation in E. coli and the dearth of negatively autoregulating genes in yeast, files 
and humans can be reconciled by taking into account a simple biological attribute — ploidy. Importantly, 
the difference between haploid and diploid regulation is not a mere correlate of the prokaryote-eukaryote 
divide. This was already suggested by the finding that the genetic networks of E. coli and yeast are — with 
the exception of their use of autoregulation — very similar [l3] and is further corroborated by the the fact 
that the predictions of our model are met by data on single and duplicated genes within E. coli. 

More generally, our work demonstrates that regulatory evolution can be considerably complicated by 
the presence of multiple copies of a gene in a cell, as is typically the case for eukaryotes. By explicitly 
considering the evolution of regulatory interactions, we have highlighted constraints that would not be 
evident from an analysis of the functional properties of an existing regulatory interaction in isolation — 
strong negative autoregulation quickens the response of genes to perturbation, but it is hard to evolve for 
this purpose due to under-dominance. This evolutionary perspective needs to be absorbed into attempts 
at unravelling the function of regulatory networks in higher organisms, a key problem for systems biology. 
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Methods 



Monte-Carlo simulations 

We used simulations of the molecular dynamics within a cell to determine the amout of intrinsic noise of 
autoregulating genes in diploids. A model that tracks the number of mRNA and protein molecules for 
a negatively autoregulating gene within a haploid cell is described in [7|. We generalised this to account 
for diploidy. The state of the system is described by the number of mRNA molecules rj, and the number 
of protein molecules pi produced from the two alleles i G {1,2}. The probability of a state {ri, ^2,^1,^2} 
is specified by the joint probability distribution ^^T■l,r2,Pl,P2(^)• The transition probabilities for the system 
to move between states due to changes in ri and pi (and, analogously, due to changes in r2 and P2) are 
given by 

{ri,r2,pi,P2} > {ri + l,r2,pi,P2}, 

{ri,r2,Pi,P2} {ri,r2,pi + l,P2}, 

{ri,r2,Pi,P2} {n - l,r2,pi,P2}, 



{ri,r2,Pi,P2} ^ {ri,r2,pi-l,P2}, 



where p = Pi + P2, h + 4>i{p) is the rate at which mRNA molecules are transcribed from allele 1, 7^ is the 
rate of mRNA degradation, kp is the rate at which mRNA is translated into protein and ■jp is the rate of 
protein degradation. As in the ODE model, (pi [p) is a function of the number of proteins p present in the 
cell, such that 



{P) 



1 + ^ 



where ko is the maximum rate of mRNA transcription, and Ki is the dissociation constant of the binding 
site of allele 1. 

To calculate response times we first determined the equilibrium expression level of the system from 
the average of 10^ replicate Monte-Carlo simulations. We then reduced mRNA and protein levels to a 
fraction a of the equilibrium level. The time for each replicate to return to equilibrium was measured 
and the average across the ensemble used as an estimate of the response time of the system. In order to 
determine how response times vary with the level of perturbation, simulations were run for values of a 
between and 1 in steps of 0.01. 

Simulations of binding site evolution 

Binding site evolution was modelled by generating a transcription factor binding motif with a length n 
nucleotides and an optimal base associated with each nucleotide. As in other models of TF-DNA binding, 
when a given nucleotide i was matched for for the optimal base it contributed an amount ej to binding 



energy, otherwise it contributed 20-22j. 

Binding site lengths were drawn from an empirical distribution generated from the binding motifs of 454 
eukaryotic transcription factors contained in the JASPAR CORE database [23]. The value of ei for each 
nucleotide was drawn from a uniform distribution in the interval (0,3). The optimal binding strength 
Kopt was determined numerically (see Appendix), using the values for the system parameters that are 
given in the legend of Figure 4. We excluded from our analysis any binding sites for which the total 
binding strength of the optimal sequence was too low to achieve the fastest response time (i.e., those 
sequences for which K = exp[— ^^e^] > Kopt). Evolution started from a state of minimum affinity (all 
nucleotides non-optimal) and proceeded through a series of single nucleotide substitutions. At each time 
step, a random mutation was introduced into the binding site sequence, switching one nucleotide from 
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the non-optimal to the optimal state. If the mutation resulted in a response time less than or equal 
response time of the resident, the mutant sequence was assumed to go to fixation in the population. 
Deleterious mutations that increased response times were assumed to be lost. The simulation was ended 
when no further advantageous mutations were available. Simulations were carried out for both haploids 
and diploids (for which response time of mutants was evaluated in the heterozygote state). 



Derivation of response times in haploids 

Here we derive results for the response time of a haploid autoregulating gene. We derive results for 
the general case in which autoregulation is described by a Hill function with arbitrary coefficient n (the 
analyses in the main text assumes n = 1). 

The set of ODEs describing transcription and translation of mRNA and protein at a single autoreg- 
ulating gene are analogous to those given for one allele in Eqs. 1 for a pair of autoregulating genes in a 
diploid. In order to simplify the analysis of the system we make the change of variables 



P 

Pmax Pmin 
Ipt, 



with L = — and 7 = — . The dynamics of the system can then be rewritten as 

Pmax pmin 

7^ = P + ks[q)-s, 
dq 



where B = — ^-^^ — = — ^^^^^ — = In general B <^ 1 since ki <C ko and 

'^max finin Pmax Pmin ^0 



ks{q) = ^^(^gy , (4) 



is the rescaled form of the repression function (j){p) described in the main text. Assuming that mRNA 
decays much faster than protein j^jj?] <^ jr, then 7 <C 1, it follows that T^f is small relative to ^ 
and we can assume that transcription output goes to equilibrium rapidly. That is, we can take 7^^ ~ 
and hence that the quasi equilibrium condition s /B + ks{q) holds. Substituting into Eqs. 3, generates 
a 2-dimensional system that is well approximated by the 1-dimensional system 

^={3 + ksiq)-q. (5) 



Small Perturbations 

The Lyapunov exponent associated with Eq. 5 at equilibrium gives the rate at which the system returns 
to equilibrium following a small perturbation. It is given by 



A 



1 + 



dkv 



dq 



(6) 



Eq. 6 is always negative. In what follows we will discuss only the magnitude of the Lyapunov exponent 
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|A| with the understanding that this quantity is always negative and therefore describes the rate at which 

will always 



dks 
dq 



the system returns to equilibrium. Prom Eq. 6 it is clear that a mutation which increases 
serve to decrease the Lyapunov exponent and thus increase the rate at which the system converges to 
equilibrium. 



Evolution of a new binding site 

We compare a wild- type binding site, with dissociation constant Li, to a mutant binding site with 
dissociation constant L2 such that Li > L2 — meaning that the mutant has a stronger binding site 
than the wild-type. At equilibrium, the protein concentrations satisfy 



<l2 



/3 + 
/3 + 



1 



1 



(7) 



It is simple to show that q\ > ^2 by differentiating, with respect to L. Thus, strengthening the autoreg- 
ulatory binding site (i.e., decreasing L) will lead to a decrease in the equilibrium protein concentration, 
and so with Li > L2 we always have q\ > (?2- To calculate the value of Lgpt for which |A| is maximum, 
we note that 



dks 
dq 



n 



ksiq)il - ksiq)). 



At equilibrium q = q* = ks{q*) + P, and the Lyapunov exponent can be written as 



n 



|A| = (l + ^(g*-^)(l-g*+/3) 



and we can find the value of q* that results in the largest Lyapunov exponent. This is given by 

q* = yW+P) ■ 



Translating this back into units of protein concentration, this means that the fastest response to small 
perturbations about equilibrium occurs when 

P ~ \/ PmaxPmin • (8) 



Thus, mutations which increase the strength of negative autoreguation, (and therefore decrease i^), will 
decrease response time provided the equilibrium protein concentration is > y/pmaxPmin, as discussed 
in the main text. The optimal binding site strength Kgpt can be determined by calculating the value of 
K which gives the optimal equilibrium protein concentration of Eq. 8. In the general case of arbitrary n, 
Kopt cannot be found analytically, but it can always be found numerically. 

The derivation of Kopt presented here is based on the assumption that perturbations of the system 
are small, in which case the dynamics of the system are well captured by its Lyapunov exponent. The 
optimal binding strength under perturbations of arbitrary size can be obtained by numerical integration 
of the system. As might be expected, the valnesKopt obtained in this way are similar to those calculated 
for small perturbations above. 
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Derivation of response times in diploids 

For diploids we proceed in the same way as for a single gene, and obtain a 1-D system for expression of 
a pair of alleles (with dissociation coefficients Li and Lj) 

^ = 2/3 + ksMij) + ksjiQij) - Qij . (9) 



where 

Evolution of a new binding site 



We now consider the response time of a pair of autoregulating alleles in a diploid. When an organism is 
homozygous, both binding sites have the same dissociation constant, Li and Eq. 9 is of the same form 
as Eq. 5 for a haploid, and the results for response time in haploids can be applied. When an organism 
is heterozygous however, the results for haploids do not hold. We compare the Lyapunov exponents of a 
heterozygote with dissociation constants Li and L2, where L2 < -Li, to a resident homozygote in which 
both binding sites have strength Li. At equilibrium the total protein concentrations satisfy 

Qu = 2/5 + 



QI2 = W+ 7^^ + ' (10) 

where q\i is the equilibrium protein concentration of the (resident) homozygote and q*^ equilibrium 
expression of the (mutant) heterozygote. It is simple to show that q\-^ > 5^2- by differentiating Eq. 10 
with respect to L2. 

Small perturbations 

Following a small displacement from equilibrium, under-dominance will occur if the heterozygote has a 
smaller Lyapunov exponent than the homozygote. The maximal Lyapunov exponent of the system is 
given by 



for the homozygote, and 



|Al2|= 1 + 4 (91,12 -^)(l-9l*,12+/5) 

+ 4 fel2 -/?)(!- 92,12 + (12) 



for the heterozygote, where qi^ij referes to allele i in a diploid carrying alleles i and j. We can observe that 

the squared difference in the mean allele expression, V, is given by F = ^2 ~ j + (^2 12 ~ ^ 
which can be expanded to give 

^= Kl2)^ + (92,12)^- 
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Substituting this expression for 

IA12I 

Note that Eq. 14 is of the same form as Eq. 13, with an additional term that depends on the ratio of the 
squared difference in allele expression, V to the total expression. We can define Xhom to be the Lyapunov 
exponent associated with a homozygotc of a given equilibrium expression and Xhet to be the Lyapunov 
exponent associated with a heterozygote of the same equilibrium expression and obtain Eq. 2 of the main 
text (with n = 1). 
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